Stemming of French Words Based on Grammatical Categories

نویسنده

  • Jacques Savoy
چکیده

Automatic indexing systems use suffix stripping algorithms to cluster various words derived from a common root under the same stem. Currently, removing affixes to either a context-free or context-sensitive operation, where the context refers to the remaining stem. In this article, we propose a suffixing algorithm which uses grammatical categories to enhance the stemming process. This approach supports the use of foreign languages. In our case, the language is French, and a morphological analysis is required for removing inflectional suffixes or morphosyntactic variants of a lemma. After this analysis, we implement a suffix stripping algorithm which uses a dictionary and the grammatical categories to remove derivational suffixes. Our approach always returns a linguistically correct lemma, but not necessarily the “right” one. Based on our tests, this solution is an attractive one, with a mean error rate of 16%. We finish by explaining why we cannot expect significantly better results with this approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Categorizing words using 'frequent frames': what cross-linguistic analyses reveal about distributional acquisition strategies.

Mintz (2003) described a distributional environment called a frame, defined as the co-occurrence of two context words with one intervening target word. Analyses of English child-directed speech showed that words that fell within any frequently occurring frame consistently belonged to the same grammatical category (e.g. noun, verb, adjective, etc.). In this paper, we first generalize this result...

متن کامل

Prominence perception and accent detection in French. A corpus-based account

The goal of this paper is to shed new light on accentuation in French, more precisely to discuss the role of grammatical constraints and of phonetic factors implicated in the perception of French final and non-final accent. The study is based on the analysis of a 70-minute long corpus, including various speaking styles. The corpus has been annotated manually and automatically for prominence det...

متن کامل

The phonological-distributional coherence hypothesis: cross-linguistic evidence in language acquisition.

Several phonological and prosodic properties of words have been shown to relate to differences between grammatical categories. Distributional information about grammatical categories is also a rich source in the child's language environment. In this paper we hypothesise that such cues operate in tandem for developing the child's knowledge about grammatical categories. We term this the Phonologi...

متن کامل

Grammatical Gender Affects Bilinguals’ Conceptual Gender: Implications for Linguistic Relativity and Decision Making

We used a non-linguistic gender attribution task to determine how French and Spanish grammatical gender affects bilinguals’ conceptual gender. French-English and Spanish-English bilingual, as well as English monolingual adults were asked to assign a male or female voice to 32 color drawings depicting people, animals, and common objects. FrenchEnglish and Spanish-English bilinguals classified it...

متن کامل

Ambiguous function words do not prevent 18-month-olds from building accurate syntactic category expectations: An ERP study.

To comprehend language, listeners need to encode the relationship between words within sentences. This entails categorizing words into their appropriate word classes. Function words, consistently preceding words from specific categories (e.g., the ballNOUN, I speakVERB), provide invaluable information for this task, and children's sensitivity to such adjacent relationships develops early on in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIS

دوره 44  شماره 

صفحات  -

تاریخ انتشار 1993